perf: early termination for right semi/anti hash joins#21775
perf: early termination for right semi/anti hash joins#21775SubhamSinghal wants to merge 3 commits intoapache:mainfrom
Conversation
|
run benchmark tpch10 |
|
run benchmark tpch_mem |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing semi-anti-early-termination (f436877) to 64619a6 (merge-base) diff using: tpch10 File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing semi-anti-early-termination (f436877) to 64619a6 (merge-base) diff using: tpch_mem File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch_mem — base (merge-base)
tpch_mem — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
File an issue against this benchmark runner |
|
TPC-H Benchmark @SubhamSinghal this looks too fast to be correct? |
|
@Dandandan might be some mistake on my side for calculating benchmark. but the optimisation looks fine right. Planning to file follow-up PR if direction looks right. |
|
run benchmark tpcds |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing semi-anti-early-termination (f436877) to 64619a6 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
EeshanBembi
left a comment
There was a problem hiding this comment.
The optimization is correct and the existing test suite validates semantics. A couple of non-blocking observations below.
| hash_value: u64, | ||
| predicate: &mut dyn FnMut(u64) -> bool, | ||
| ) -> Option<u64> { | ||
| let next: Vec<u64> = self.next.iter().copied().collect(); |
There was a problem hiding this comment.
Minor: PruningJoinHashMap::get_first_match allocates a Vec<u64> on every invocation:
let next: Vec<u64> = self.next.iter().copied().collect();
get_first_match_impl::<u64>(&self.map, &next, hash_value, predicate)PruningJoinHashMap isn't used in the standard HashJoinStream path, so this is fine today. But if this method is ever hot in a symmetric hash join the allocation will show up. Consider having get_first_match_impl accept &[T] (or a generic impl AsRef<[T]>) directly — then SmallJoinHashMap can pass self.next.as_slice() with zero copy and PruningJoinHashMap can do the same once it stores a plain Vec<u64>.
There was a problem hiding this comment.
addressed comment
EeshanBembi
left a comment
There was a problem hiding this comment.
Test coverage suggestion (non-blocking): the optimization targets the scenario where one key has many duplicate rows in the build side (so the chain walk terminates early). A test with high-cardinality duplicates (e.g. 1 key with 1,000 duplicate rows) would explicitly document that scenario and guard against future regressions.
Which issue does this PR close?
Rationale for this change
For RightSemi/RightAnti hash joins, the probe phase only needs to know whether at least one matching build row exists per probe row. However, the current code:
(probe, build)pairequal_rows_arron all pairs (take()+eq_dyn_null(), allocating intermediate arrays)get_semi_indicesto keep just the first match per probe rowFor skewed data with N duplicate build rows per key, this produces N pairs per probe row only to keep 1 — O(N) wasted work in both chain traversal and equality verification.
What changes are included in this PR?
New
get_first_matchmethod onJoinHashMapTypetrait (join_hash_map.rs):Option<u64>— the matching build row index, orNoneget_first_match_impl, reused byJoinHashMapU32,JoinHashMapU64, andPruningJoinHashMapNew
process_probe_batch_right_semi_antifunction (stream.rs):get_first_matchwith an equality predicate built fromJoinKeyComparator::is_equal(zero-allocation per-row comparison)build_batch_from_indicesAre these changes tested?
Covered by existing tests:
No new tests added — this is a pure optimization that produces identical results. The existing test suite comprehensively covers RightSemi/RightAnti join.
Are there any user-facing changes?
No.
TPC-H Benchmark result: